reference game
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
- North America > United States > California (0.04)
- (2 more...)
EmergentCommunication
Recall that ˆmc(u) is exactly the listener's decoder in the IB framework (see Section 3.1.1). Therefore, anyother decoder would lend an upper bound on the informativeness loss term. Notice that under our assumptions,ˆmc is a Gaussian mixture, whereas the speaker's beliefs are simply Gaussian. All the systems with the samek form an equivalence class and the canonical system within each class is the one with minimalk. These canonical systems are the natural one to prefer, because they can attain the optimum for a given complexity with aminimal codebook.
- Europe > France (0.05)
- North America > United States (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
Context informs pragmatic interpretation in vision-language models
Tan, Alvin Wei Ming, Prystawski, Ben, Boyce, Veronica, Frank, Michael C.
Iterated reference games - in which players repeatedly pick out novel referents using language - present a test case for agents' ability to perform context-sensitive pragmatic reasoning in multi-turn linguistic environments. We tested humans and vision-language models on trials from iterated reference games, varying the given context in terms of amount, order, and relevance. Without relevant context, models were above chance but substantially worse than humans. However, with relevant context, model performance increased dramatically over trials. Few-shot reference games with abstract referents remain a difficult task for machine learning models.
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (6 more...)
Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?
Junker, Simeon, Ali, Manar, Koch, Larissa, Zarrieß, Sina, Buschmeier, Hendrik
We investigate the linguistic abilities of multimodal large language models in reference resolution tasks featuring simple yet abstract visual stimuli, such as color patches and color grids. Although the task may not seem challenging for today's language models, being straightforward for human dyads, we consider it to be a highly relevant probe of the pragmatic capabilities of MLLMs. Our results and analyses indeed suggest that basic pragmatic capabilities, such as context-dependent interpretation of color descriptions, still constitute major challenges for state-of-the-art MLLMs.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (6 more...)
Success and Cost Elicit Convention Formation for Efficient Communication
Vaduguru, Saujas, Hua, Yilun, Artzi, Yoav, Fried, Daniel
Humans leverage shared conversational context to become increasingly successful and efficient at communicating over time. One manifestation of this is the formation of ad hoc linguistic conventions, which allow people to coordinate on short, less costly utterances that are understood using shared conversational context. We present a method to train large multimodal models to form conventions, enabling efficient communication. Our approach uses simulated reference games between models, and requires no additional human-produced data. In repeated reference games involving photographs and tangram images, our method enables models to communicate efficiently with people: reducing the message length by up to 41% while increasing success by 15% over the course of the interaction. Human listeners respond faster when interacting with our model that forms conventions. We also show that training based on success or cost alone is insufficient - both are necessary to elicit convention formation.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (2 more...)
- Research Report (1.00)
- Questionnaire & Opinion Survey (0.68)